{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Supabase Vector Store\n",
    "In this notebook we are going to show how to use [Vecs](https://supabase.github.io/vecs/) to perform vector searches in LlamaIndex.  \n",
    "See [this guide](https://supabase.github.io/vecs/hosting/) for instructions on hosting a database on Supabase "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c2d1c538",
   "metadata": {},
   "outputs": [],
   "source": [
    "import logging\n",
    "import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "from llama_index import SimpleDirectoryReader, Document, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import SupabaseVectorStore\n",
    "import textwrap"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup OpenAI\n",
    "The first step is to configure the OpenAI key. It will be used to created embeddings for the documents loaded into the index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "67b86621",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"[your_openai_api_key]\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "Load the documents stored in the `../data/paul_graham/` using the SimpleDirectoryReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: fb056993-ee9e-4463-80b4-32cf9509d1d8 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e\n"
     ]
    }
   ],
   "source": [
    "documents = SimpleDirectoryReader(\"../data/paul_graham/\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id, \"Document Hash:\", documents[0].doc_hash)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create an index backed by Supabase's vector store. \n",
    "This will work with all Postgres providers that support pgvector.\n",
    "If the collection does not exist, we will attempt to create a new collection \n",
    "\n",
    "> Note: you need to pass in the embedding dimension if not using OpenAI's text-embedding-ada-002, e.g. `vector_store = SupabaseVectorStore(..., dimension=...)`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "8731da62",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = SupabaseVectorStore(\n",
    "    postgres_connection_string=\"postgresql://<user>:<password>@<host>:<port>/<db_name>\",\n",
    "    collection_name=\"base_demo\",\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "We can now ask questions using our index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/suo/miniconda3/envs/llama/lib/python3.9/site-packages/vecs/collection.py:182: UserWarning: Query does not have a covering index for cosine_distance. See Collection.create_index\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Who is the author?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "8cf55bf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author of this text is Paul Graham.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author do growing up?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "fdf5287f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " The author grew up writing essays, learning Italian, exploring Florence, painting people, working\n",
      "with computers, attending RISD, living in a rent-stabilized apartment, building an online store\n",
      "builder, editing Lisp expressions, publishing essays online, writing essays, painting still life,\n",
      "working on spam filters, cooking for groups, and buying a building in Cambridge.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c9407557",
   "metadata": {},
   "source": [
    "## Using metadata filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "39cae198",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.schema import TextNode\n",
    "\n",
    "nodes = [\n",
    "    TextNode(\n",
    "        \"The Shawshank Redemption\",\n",
    "        metadata={\n",
    "            \"author\": \"Stephen King\",\n",
    "            \"theme\": \"Friendship\",\n",
    "        },\n",
    "    ),\n",
    "    TextNode(\n",
    "        \"The Godfather\",\n",
    "        metadata={\n",
    "            \"director\": \"Francis Ford Coppola\",\n",
    "            \"theme\": \"Mafia\",\n",
    "        },\n",
    "    ),\n",
    "    TextNode(\n",
    "        \"Inception\",\n",
    "        metadata={\n",
    "            \"director\": \"Christopher Nolan\",\n",
    "        },\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "5d58639c",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = SupabaseVectorStore(\n",
    "    postgres_connection_string=\"postgresql://<user>:<password>@<host>:<port>/<db_name>\",\n",
    "    collection_name=\"metadata_filters_demo\",\n",
    ")\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex(nodes, storage_context=storage_context)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9fb0618b",
   "metadata": {},
   "source": [
    "Define metadata filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "17b2ac01",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters\n",
    "\n",
    "filters = MetadataFilters(filters=[ExactMatchFilter(key=\"theme\", value=\"Mafia\")])"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d875f6b5",
   "metadata": {},
   "source": [
    "Retrieve from vector store with filters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "79afe7f1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=Node(text='The Godfather', doc_id='f837ed85-aacb-4552-b88a-7c114a5be15d', embedding=None, doc_hash='f8ee912e238a39fe2e620fb232fa27ade1e7f7c819b6d5b9cb26f3dddc75b6c0', extra_info={'theme': 'Mafia', 'director': 'Francis Ford Coppola'}, node_info={'_node_type': '1'}, relationships={}), score=0.20671339734643313)]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever = index.as_retriever(filters=filters)\n",
    "retriever.retrieve(\"What is inception about?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5dcb3ab",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "38a327e7bea9478b86ff5be1afa4768c851785146a2113bbf2930d1c8dbd310f"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
